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Abstract — Multiplications occur frequently in digital signal 
processing systems, communication systems, and other 
application specific integrated circuits. Multipliers, being 
relatively complex units, are deciding factors to the overall 
speed, area, and power consumption of digital computers. 
Multiplication is an important fundamental function in 
arithmetic operations. They usually contribute significantly to 
the time delay and take up a large silicon area in DSP systems. In 
this thesis, parallel multipliers are addressed because of their 
speed superiority. Parallel multipliers are combinational circuits 
and can be subject to any standard combinational logic 
optimization. This thesis discusses two types of parallel 
multiplier, the Braun multiplier(Carry save array multiplier) 
and Tree multiplier. In this thesis, 4x4 unsigned array multiplier 
and tree multiplier architecture are designed using different 
circuit techniques for 1-bit full adders, XOR2 and AND2 
functions. 

The different types of circuit techniques used follow a 
unique pattern of structure to improve their performance in 
various means like low power , minimal delay and optimal PDP. 
1-bit full adder, AND2 and XOR2 functions are basic 
components of many large circuits. The various type of circuit 
styles used for adders, AND2 and XOR2 are CMOS logic style, 
CPL logic, DPL logic technique. As the power and speed of these 
circuits affects the entire performance of multiplier circuits and 
hence their individual performances are also compared and 
discussed in the thesis. 

The Braun Multiplier and Tree Multiplier are 
implemented with CMOS logic, CPL logic and DPL logic with 
the main objective to calculate average power, delay and PDP of 
4x4 multipliers using above circuit techniques and compare 
their performances. And all the circuits are designed and 
simulated using 90nm technology, 2.5V supply. Finally layouts 
of the Braun multiplier and the Tree multiplier are designed 
using CMOS logic. Also layouts of all the basic 
circuits(AND2,XOR2 and Full Adder) are designed using 
CMOS logic, CPL logic and DPL logic. The layouts of these 
basic gates ,the Tree multiplier and the Braun multiplier are 
verified by their corresponding waveforms. 

Index Terms — Braunmultiplier, Tree Multiplier, XOR-2, DSP 
System. 

I. INTRODUCTION 

Multiplication is an important fundamental function in 
arithmetic operations. Multipliers are frequently used in 
various applications such as DSP applications. They usually 
contribute significantly to the time delay and take up a large 
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silicon area in DSP systems. In multipliers speed is the major 
factor as it dominates the execution time of the system but 
with the changing trend for increased computing power, 
minimizing power dissipation is also desired while 
maintaining same performance. Early multiplier designs focus 
on pursuing high speed operation or low circuit complexity. 
However, with the advance of VLSI technology, the 
computation speed can be improved at a constant pace. 
Instead, power/ energy consumption has become a more and 
more prominent design factor under the prevailing of battery 
operated mobile devices. Binary Multiplier is one of the most 
commonly used circuits in the digital devices. 

There are various types of multipliers available depending 
upon the application in which they are used. They can be 
broadly classified as parallel multipliers, serial multiplier and 
serial-parallel multipliers. Among these parallel multiplier are 
the fastest but occupies more area as compared to serial 
multipliers. Whereas serial-parallel multipliers are tradeoff 
between parallel and serial multipliers. Previously the main 
challenge for IC designer was to reduce area of chip. Then the 
next demand is to increase the speed of process to attain fast 
calculations. However area and speed are two conflicting 
constraints. This thesis discusses parallel multipliers because 
of their high speed and large number of applications 

This thesis discusses two types of parallel multipliers, the 
Braun multiplier(Carry Save multiplier) and Tree multiplier. 
Both are unsigned multipliers and therefore operate on 
unsigned binary numbers only. In this thesis, 4x4 unsigned 
Carry Save array multiplier and tree multiplier architecture 
are designed using different circuit techniques for 1-bit full 
adders, XOR2 and AND2 functions. The different circuit 
techniques used are CMOS logic, CPL logic and DPL logic. 

The different types of circuit techniques used follow a unique 
pattern of structure to improve their performance in various 
means like low power, minimal delay and increased PDP. The 
performance of AND2 gate , XOR2 and PA using CMOS 
logic style, CPL logic, DPL logic technique are also 
compared as power and speed of these circuits affects the 
entire performance of multiplier circuits 

II. BIT PULL ADDER 

Pull adder is one of the basic building blocks of many of the 
digital VLSI circuits. A full adder adds two binary numbers 
with a carry-in. There are a total of three inputs for a full 
adder, two for the input numbers A and B, and one for the 
carry-in, C in . The outputs are the Sum and Carryout 
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Figure 1 . Logic circuit of the conventional full adder 
III. CMOS FULL ADDER 

Logic gates in conventional or complementary CMOS are 
built from an NMOS pull-down and a dual PMOS pull-up 
logic network. Any logic function can be realized by NMOS 
pull-down and PMOS pull- up networks. Other advantages of 
the CMOS logic style are its robustness against voltage 
scaling and transistor sizing (high noise margins) and thus 
reliable operation at low voltages and arbitrary transistor sizes 
(ratioless logic). Input signals are connected to transistor 
gates only, which facilitates the usage and characterization of 
logic cells. The layout of CMOS gates is straightforward and 
efficient due to the complementary transistor pairs. CMOS 
fulfills all the requirements regarding the ease-of-use of logic 
gates. An often mentioned disadvantage of complementary 
CMOS is the substantial number of large PMOS transistors, 
resulting in high input loads. However, the best gate 
performance is achieved with a PMOS/NMOS width ratio of 
only about 1.5[17]. Another drawback of CMOS is the 
relatively weak output driving capability due to series 
transistors in the output stage. This, however, can be 
corrected by additional output buffers/inverters which are 
inherent in other logic styles. 


subsequent output inverters or logic gates. Adjusting the 
threshold voltages as a solution at the process technology 
level is usually not feasible for other reasons. In order to 
decouple gate inputs and outputs and to provide acceptable 
output driving capabilities, inverters are usually attached to 
the gate outputs . Because the MOS networks are connected to 
variable gate inputs rather than constant power lines, only one 
signal path through each network must be active at a time in 
order to avoid shorts between inputs. Therefore, each 
pass-transistor network must realize a multiplexer structure, 
which limits the number of logic functions that can be 
implemented efficiently. Because these pass -transistor 
multiplexer structures require complementary control signals, 
dual-rail logic is usually used in order to provide all signals in 
complementary form. As a consequence, two MOS networks 
are again required in addition to the swing restoration and 
output buffering circuitry, which all in all annihilates the 
advantage of low transistor count and small input loads of 
pass-transistor logic. Also, the required double inter-cell 
wiring increases wiring complexity and capacitance by a 
considerable amount. A small advantage of dual-rail logic is 
that inverted signals are for free. Layout of pass-transistor 
cells is not as straightforward and efficient due to rather 
irregular transistor arrangements and high wiring 
requirements. Finally, pass-transistor logic with swing 
restoration circuitry is sensitive to voltage scaling and 
transistor sizing with respect to circuit robustness (reduced 
noise margins), i.e., efficient or reliable operation of logic 
gates is not necessarily guaranteed at low voltages or small 
transistor sizes. In other words, transistor sizing is crucial for 
correct gate operation and therefore more difficult (ratioed 
logic). Short-circuit currents are rather large due to competing 
signals in the swing restoration circuitry. Many different 
pass-transistor logic styles have been proposed[3]. 
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Figure 3 Transistor diagram of CPL full adder 


IV. CPL 1 -BIT FULL ADDER 

CPL is a pass transistor logic. The basic difference of 
pass-transistor logic compared to the CMOS logic style is that 
the source side of the logic transistor networks is connected to 
some input signals instead of the power lines. The advantage 
is that one pass-transistor network (either NMOS or PMOS) is 
sufficient to perform the logic operation, which results in a 
smaller number of transistors and smaller input loads, 
especially when NMOS networks are used. However, the 
threshold voltage drop through the NMOS transistors while 
passing a logic “1” makes swing (or level) restoration at the 
gate outputs necessary in order to avoid static currents at the 


V. TREE MULTIPLIER 

In Tree multiplied 12], the partial products sum adders are 
arranged in a treelike fashion, reducing both the critical path 
and the number of adder cells needed. The objective is to 
reduce the number of adder elements and to reduce the depth 
of the tree. A 1-bit adder is used as 3:2 compressor, which 
takes three inputs and produces two outputs. If the truth table 
of 1-bit adder is examined, it may be seen that an adder is in 
effect a "ones counter" that counts the number of l's on the A, 
B, C inputs and encodes them on the sum and carry outputs. 
The addition of partial products in a column of an array is 
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equivalent to the number of l's in that column with the carry 
being passed to the next column to the left. 



Figure 4 Schematic circuit of 4 x 4 tree multiplier 

To get the total propagation time ,the final CPA time 
is to be added to the propagation time of the array. The delay 
through the array addition is propotional to log 3 / 2 n , where n 
is the width of the tree. In a simple array multiplier it is 
propotional to n.This high speed of operation for tree 
multipliers is due 1-bit adders used as 3:2 compressors which 
avoids carry propagation. The other advantage of tree 
multiplier is substantial reduction in hardware for large 
multipliers. The main disadvantage of tree multiplier is that its 
architecture exhibits irregularties in the layout because it has a 
relatively complicated interconnection scheme. 

VI. Schematic and Simulation Result of CMOS AND 

GATE 
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Figure 7 Schematic of CPL AND gate 
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Figure 8. Simulation Result of CPL AND gate 




Figure 5 Schematic of CMOS AND gate 
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Figure 6 Simulation Result of CMOS AND gate 


Schematic and Simulation Result of CPL AND gate 


Schematic and Simulation Result of CMOS 1-bit full 
adder 



Figure 9 Schematic of CMOS 1-bit full adder. 
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Figure 10 Simulation Result of CMOS 1-bit full adder. 
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VII. Conclusion 

In this thesis two types of parallel multipliers 
are discussed namely Braun Multiplier and Tree Multiplier 
using CMOS logic, CPL logic and DPL logic . A Braun 
Multiplier consists of AND gates, half adders and full adders. 
As the performance of these sub-components affects the 
performance of the Braun Multiplier their individual 
performances are also compared on the basis of average 
power consumed , propagation delay, PDP and number of 
transistors. Their comparison is presented in table 6.1 ,6.2 
and 6.3. For AND gate CMOS logic has minimum number of 
transistors whereas minimum delay is obtained for DPL logic 
and minimum power consumption is found for CMOS logic. 
For half adder CPL logic has minimum number of transistors 
whereas minimum delay and power consumption is found for 
DPL logic. For Full Adder CPL and CMOS logic have 28 
transistors and DPL has 40 transistors whereas minimum 
delay and power consumption is found for CMOS logic. 

VIII. Future Scope 

The implementation of the Braun Multiplier and Tree 
Multiplier using CMOS , CPL and DPL circuit techniques 
can be extended to higher order Braun Multiplier and Tree 
Multiplier architectures and their performance comparison 
can be done. Also other logic styles can be used to implement 
Braun Multiplier and Tree Multiplier for different 
performance parameters .Layouts of Braun Multiplier and 
Tree Multiplier architectures for CPL and DPL logic can be 
designed. 
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